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Abstract. This paper investigates the detection of information hidden 
by the Least Significant Bit (LSB) matching scheme. In a theoretical 
context of known image media parameters, two important results are 
presented. First, the use of hypothesis testing theory allows us to de- 
sign the Most Powerful (MP) test. Second, a study of the MP test gives 
us the opportunity to analytically calculate its statistical performance 
in order to warrant a given probability of false-alarm. In practice when 
detecting LSB matching, the unknown image parameters have to be es- 
timated. Based on the local estimator used in the Weighted Stego-image 
(WS) detector, a practical test is presented. A numerical comparison with 
state-of-the-art detectors shows the good performance of the proposed 
tests and highlights the relevance of the proposed methodology. 



1 Introduction and Contributions. 

Steganography and steganalysis form a cat-and-mouse game. On the one hand, 
steganography aims at hiding the very presence of a secret message by hiding it 
within an innocuous cover medium. On the other hand, the goal of steganalysis 
(in the wide sense) is to obtain any information about the potential stegano- 
graphic system from an unknown medium. Usually, steganalysis focuses on ex- 
posing the existence of a hidden message in an inspected medium. 
Many steganographic tools are nowadays easily available on the Internet making 
steganography within the reach of anyone, for legitimate or malicious usage. It 
is thus crucial for security forces to be able to reliably detect steganographic 
content among a (possibly very large) set of media files. In this operational 
context, the detection of a rather simple but most commonly found stegosys- 
tem seems more important than the detection of a very complex but rarely 
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encountered stegosystem. The vast majority of downloadable steganographic 
tools insert the secret information in the LSB plane. Consequently, substan- 
tial progress has recently been made in the detection of such steganographic 
algorithms, namely LSB replacement and LSB matching, also known as LSB ±1 
embedding (see [1111511] and the references therein). However, the steganalysis 
of LSB matching remains much harder than the steganalysis of LSB replacement. 
Indeed, if LSB matching is used instead of LSB replacement, the detection power 
of state-of-the-art detectors is significantly lower |25I5| . 

The recently proposed steganalyzers dedicated to LSB matching can be 
roughly divided into two categories. On the one hand, most of the latest de- 
tectors are based on supervised machine learning methods and use targeted [614] 
or universal features [17 23 . As in all applications of machine learning, the the- 
oretical calculation of error probabilities remains an open problem [24 . On the 
other hand, the authors of [TB] observed that LSB matching acts as a low-pass 
filter on the image Histogram Characteristic Function (HCF). This pioneering 
work lead to an entire family of histogram-based detectors |T9 25J . 

In the operational context described above, the proposed steganalyzer must 
be immediately applicable without any training or tuning phase. For this reason, 
the use of a machine learning based detector is hardly possible. Moreover, the 
most important challenge for the steganalyst is to provide detection algorithms 
with an analytical expression for the false-alarm and missed-detection probabil- 
ities without which the "uncertainty" of the result can not be "measured." The 
proposed LSB matching steganalyzers are certainly very interesting and efficient, 
but these ad hoc algorithms have been designed with a very limited exploitation 
of statistical cover models and hypothesis testing theory. Hence, a few theoret- 
ical results exist and the only solution to measure their statistical performance 
is the simulation on large databases. 

Alternatively, the first step in the direction of hypothesis testing has been 
made in [121819] for LSB replacement to design a statistical test with known 
statistical properties. In the present paper, this statistical approach is extended 
to the case of detecting LSB matching. More precisely, the goal of this paper is 
threefold: 

1. Define the most powerful (MP) test in the theoretical case when the cover 
image parameters are known, namely the expectation and noise variance of 
each pixel. 

2. Analytically calculate the statistical performance of the MP test in terms 
of the false-alarm and missed-detection probabilities. More importantly, this 
result allows us to highlight the impact of the noise variance and quantization 
on the test performance [H]. 

3. Design a practical efficient implementation of this test based on a simple 
local estimation of expectation and variance of each pixel. 

The paper is organized as follows. The problem of LSB matching steganalysis 
is casted within the framework of hypothesis testing in Section [2j Following the 
Neyman-Pearson approach, the MP Likelihood Ratio Test (LRT) is presented in 
Section [3] and its statistical performance is calculated in Section [4] Finally, the 



proposed practical implementation of the Generalized LRT (GLRT) is presented 
in Section [5] To show the relevance of the proposed approach, numerical results 
on large natural image databases are shown in Section [6] Section [7] concludes 
the paper. 

2 Detection of LSB Matching Problem Statement. 

This paper mainly focuses on natural images but the extension of the presented 
results to any kind of digital media is immediate. Hence, the column vector C = 
(ci, . . . , cjy) T represents in this paper a cover image of N = N x xN y grayscale 
pixels. The set of grayscale levels is denoted Z = {0; . . . ; 2 S ~ 1 } as pixels values 
are usually unsigned integers encoded with B bits. Each cover pixel c„ results 
from the quantization: 

c n = Q(y n ), (1) 

where y n € R + denotes the raw pixel intensity recorded by the camera and Q 
represents the uniform quantization with a unitary step: 

Q{x) = k-^x e[k-l/2; k + 1/2 [. 

Seeking simplicity, it is assumed in this paper that the saturation effect is absent, 
i.e. the probability of excessing the quantizer boundaries —1/2 and 2 B_1 + 1/2 is 
negligible. Indeed, taking into account the under or over-exposed pixels is rather 
simple but requires a much more complicated notation. 
The recorded pixel value can be decomposed as |13I7| : 

y n =6n + Zn, (2) 

where 8 n is a deterministic parameter corresponding to the mathematical expec- 
tation of y n and £„ is a random variable representing all the noise corrupting the 
cover image during acquisition. As described in |13j . £ n is accurately modeled 
as a realization of a zero-mean Gaussian random variable S n ~ N(0,az) whose 
variance o\ varies from pixel to pixel. It thus follows from ([lj and Q that c n 
follows a distribution Pg n — Pe„.a n — {pe„ [0], . . . ,pe n [2 B_1 ]) defined by: 

y* e i,^ W -.f*±V^)-#f*=V^*.), (3) 

with <P is the standard Gaussian cumulative distribution function (cdf) defined 
by <P(x) — f_ 4>{u)du and 4> the standard Gaussian probability distribution 
function (pdf) <p(u) = ^=exp(u 2 /2). In virtue of the mean value theorem, |3j) 
can be written as: 

P)M =l r +i j^) du= j^ + x (4) 



where e is a (small) corrective term [26 . 



To statistically model stego-image pixels from (|3J) — (|4]) , the two following as- 
sumptions are usually adopted |12I14| : 1) the probability of insertion is equal 
for every cover pixel (independence between hidden bits and cover pixels) and 
2) the message is assumed compressed and/or cyphered M = (mi, . . . ,ijil) t 
before insertion. Hence, each hidden bit m; is drawn from a binomial distribu- 
tion B(l, 1 /2), i.e. mi is either or 1 with the same probability. This situation is 
captured by denoting 

Vn G { °' ' • ' ' N} ' { P[s n P =c„ + C ds(mt cO] '= R, (5) 

where S = {s±, . . . , sn} are the values of stego-image pixels, the embedding 
rate R = l /n corresponds to the number of hidden bits per cover pixel and 
ins(m„, c n ) represents the value added to c n to insert the hidden bit m n . 
The particularity of LSB matching lies in its insertion function ins : {0;l}xZ^ 
{ — 1; 0; 1}. Whenever the LSB of c„ is equal to m n , i.e. when lsb(c„) = c„mod2 = 
m„, there is no need to change c„, hence ins(m„,c„) = 0. On the contrary, 
whenever lsb(c„) =^ m n , the insertion must change the LSB of c n , which is done 
by adding or subtracting 1 with the same probabilities: 

f P[ins(6 s , c n ) = 1 1 lsb(c„) ^ mn] = 1/2 

\ P[ins(6 s , c„) = -1 1 lsb(c„) ^ m n ] = 1/2. { ' 

Since each hidden bit m n follows the binomial distribution 23(1, 1 /2), a straight- 
forward calculation finally shows that P[lsb(c„) = m n ) = P[lsb(c„) 7^ m n ] = V 2 - 
Hence, as described in |18I25I6|ID] . it follows from (Jsj — (Jsj that for all n e 
{1, . . . , N}, the pmf of the stego-pixel s n after embedding at rate R with LSB 
matching is given by Qf n = (qf n [0], . . . , qf n [2 b - 1]) with Vfc € Z: 

Qojk] = f fan[k-l] +Pe n [k+1\) + pejk}. (7) 



3 Likelihood Ratio Test (LRT) for two simple hypotheses. 

When analyzing an unknown medium Z the first goal of LSB matching steganal- 
ysis is to decide between the two following hypotheses: 

n = K~P flB ,Vne{l,...,iV}} r , 
vs Hi = {z n ~Qf n ,Vne{l,...,iV}}. w 

Let us start with the simplest case, when the embedding rate R and, for all 
n, the parameters 9 n and er„ are known. In this case, the hypothesis testing 
problem ^ is reduced to a test between two simple hypotheses. 
The goal is obviously to find a test S : Z N >->• {Ho, Hi}, such that hypothesis 
Hi is accepted if <5(Z) = Hi (see [22] for details about statistical hypothesis 
testing). However, as explained in the introduction, in an operational forensics 



context the most important challenge is first, to warrant a prescribed (very low) 
false-alarm probability and second, to maximize the detection power defined by: 



ft = Pip(Z)=«i], 

where Pi(-) stands for the probability under hypotheses Hi , i = {0; 1}. There- 
fore, let JC a be the class of tests with an upper-bounded false-alarm probability 
ao defined by 

K a ={8:W [8{Z)=U 1 ]<a }. (9) 

In virtue of the Neyman-Pearson lemma, see [22) Theorem 3.2.1], the most pow- 
erful (MP) test over the class K, ao (|9| is the LRT given by the following decision 
rule: 

5 R (Z) = !! A /%\ - Tao (10) 

Hy ' \Ui if A R (Z) > r ao , y ' 

where r Qo is the solution of Pq[<5(Z) > r ao ] = ao, to insure that 8r G K. ao , 
and the likelihood ratio (LR) Ar(Z) is given, from the statistical independence 
between pixels, by: 

a r{ z) - n a«m - n n - w ~l ]?f" + " + (■ - f ) ■ ("> 

n— 1 n— 1 71 L nJ ^ ' 

It can be noted that A R {z n ) depends on pixel values z n through the quantity: 

, / s _ 1 V6 n [Zn ~ 1] + P9 n [z n + 1] 

2 Pe n [z n \ 

which corresponds to the the likelihood ratio for the conceptual case of R = 2. In 



other words, Equation ( 12 1 corresponds to this test: Hq : { Z is a cover medium } 
vs Hi : { each pixel of Z is modified by ± 1 }. Indeed, considering the case R = 2 
permits us to clarify the present methodology, which is then extended to the 



more general case of R s]0; 1[ in Section 4.2 



The exact expression for the LR A2(z n ) is complicated due to the corrective 
terms e defined in Q. However, the calculation shows that these corrective terms 
are usually negligible, particularly when <j n > 1. Therefore, it is proposed to 
neglect e in order to obtain a simplified expression for the LR A 2 (z n ). From Q, 
this approximation permits us to write: 



P8 n [z n - 1] _ / 1 



exp — — - exp 



'n 



r 1 "~ c \ o 9 / \ 9 

Pe n [z n \ \ 2cr-/ V cr^ 
P6 n [Zn + 1] _ / 1\ 



^p(-£o )exp(^l. (13) 



Finally, using (13), the LR ^(zn) can be written as: 



The logarithm of the likelihood ratio ( 15 1 is usually preferred in order to replace 
the product in (111 with a sum. From (14 1, it immediately follows that: 



def. , 

= log 



exp 



log (A 2 (z n )) +log(2) 



exp 
1 

f 2^ 



(15) 



Again, one can note that the terms log(4) and ^jpr do not depend on the true 
hypothesis. That is why, for the same reasons as those discussed in connection 



with Equation (12), these terms do not play any role in solving the detection 
problem Q . For the sake of clarity, these terms are thus omitted from expres- 
sion (151 of the log-LR A 2 (z n ). 



4 Statistical Performance of the LR test. 
4.1 Case of simple hypotheses, when R — 2. 

In this section it is first proposed to study the statistical performance for the case 
of simple hypotheses, when R = 2. The results are then extended to the general 



case of R €]0; 1[ in Section 4.2 To easily calculate the statistical performance 
of the LR test 8r (10 1, the asymptotic approach is of crucial interest. Moreover, 
the assumption that TV grows to infinity is relevant in practice due to the very 
large number of pixels in typical images. 

For the sake of clarity, let the mean expectation and the mean variance of A 2 (z n ) 
under hypotheses Hi be defined as follows: 

N , N 



/ E 

n=l 



\A 2 {z n ) and a* = -^^TVcm A 2 (z n ) , (16) 



where Ej [4 2 (Z)] and Var^ [4 2 (Z)] are respectively the expectation and the vari- 
ance of A 2 (z n ) under hypotheses Hi , i = {0, 1}. 

The test 8 2 associated with the "normalized" log-LR A 2 (Z) is defined as: 

N „ 

[Hi if A 2 (Z)>r ao . 2V ' ^Nal 1 ' 



It can noted that the random variables A 2 (z n ) are assumed statistically inde- 
pendent and, for any a n > 0, have finite expectation and variance, which implies 
that the conditions necessary for application of the Lindeberg's central limit the- 
orem |22L Theorem 11.2.5] are satisfied. These conditions can also be shown by 
using the fact that z n are bounded because they can only take values in the set 
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Fig. 1: Graphical representation of the two first moments of log-LR 
log (A.2(z n y\ (20) - (23 1. Presented results correspond to the case of i.i.d pix- 
els with expectation n € [126; 130] and standard deviation er„ = 0.75. 



Z. Therefore. 

'JV(0,1) under H 

MZ)^{ u (Vn(p,-,o) ^\ under (18) 
V ff o trg y 

where ~> represents the convergence in distribution as N — > oo. From Equa- 



tion (18 1, a short algebra establishes the following theorem. 



Theorem 1. For any given probability of false alarm ao G]0;1[, the decision 
threshold t Qo given by: 

r ao =^" 1 (l-a ) (19) 
where <£~ 1 (-) is i/ie Gau ssia n inverse cumulative distribution, asymptotically 



warrants that the test 62 [ID is in K, a 



The main conclusion of Theorem [T] is that the decision threshold r Qo depends 
neither on the embedding rate R nor the image parameters 9 n and o~ n . Hence, by 
using the "normalized" log-LR ^(Z), the same threshold permits us to respect 
a prescribed false-alarm probability ao whatever the analyzed image and the 
embedding rate are. 



Equation (18 1 also implies that to asymptotically calculate the detection power 



of LR test 82 (17), one only needs to calculate the first moments of /^(Z). The 



mean expectations used in the log-LR A2(z n ) are given under hypotheses Hq 



and Hi by 



Mo 



= ^EE»» w lo s ( ex p f^rO 



n=l fee^ 



^ 2 = ^EE^ t fc ] lo s ( ex p (^-J^) 

n=lkeZ v n 7 



exp 



exp 



(20) 
(21) 



where the probabilities pg n [k] and qff [k] are respectively defined in ^ and (JTl) . 
Similarly, the mean variances are by definition given under both hypotheses tto 
and Hi by: 



N 



1 N 

vE^ zZ^»w lo § ( exp 

n=l fceZ 

1 w 

vEE^„Wlog(exp 



k-e n 



N 



n=l fceZ 



exp 



exp 



Mo. 



M2- 



(22) 
(23) 



The expectations /^o and & n d the variances <7q and a\ as functions of (9„ are 
respectively drawn in Figures [la] and |lb| These figures highlight the fact that 
the pixel expectation 9 n can have a significant impact on the LR moments, and 
later on the detection power, particularly when a„ < 1. However, a thorough 
study of equations ( 20 1— ( 23 ) shows that this phenomenon rapidly tends to be 
negligible when a n > 1. 

Even thoug, the moments given in ( 20 )— ( 23 1 have a rather complicated expres- 
sion, their numerical calculation is straightforward as long as the parameters 9 n 
and o~ n are known. 

From the asymptotic distribution (18 1 of the log-LR A 2 (Z) and the expres- 
sions ( 20 1— ( 23 ) of its two first moments, the detection power of the LR test 
62 (17) is given by the following theorem. 

Theorem 2. For any o.q e]0;l[ ; assuming that the parameters {0 n }n=j_J ln ^ 
{o~ n }n=x are known, the power function f3$ 2 associated with the test 82 (11) is 
asymptotically given, as N — > 00, by: 



I3s 2 = 1 - # 



^(l-ao) 
0-2 



02 



(24) 



Proof. Using the result (18 1, it asymptotically holds that for any r Qo G M: 

«o($>) - Po pa(Z) > r ao ] = 1 - <2> (Too) . 
Hence, because is strictly increasing, one has: 

(1 - a (8 2 )) = £(r ao ) & r ao = $- x (1 - a (8 2 )) , (25) 



which proves Theore m [T[ 

It also follows from ( |18| that for any decision threshold t Qo € E the power of 
the test 82 ( 17 1 is given by: 



A 2 (Z) > r ao 



VN(^2 - A*o) 



By substituting r Qo by the value given in Theorem [T| a short algebra leads to 
the relation (24 1. This proves Theorem [2] and concludes the proof. 



4.2 General case of R 6]0; 1[. 

The case for which the embedding rate R can take any value in ]0; 1] is treated 
in a similar manner as the case R = 2. The problem of designing an optimal test 
has been shown to be particularly difficult in (55] . A thorough design a MP test 
uniformly with respect to the embedding rate lies outside of the scope of this 
paper which mainly studies the MP test for R — 2 and its practical implementa- 



tion. Hence, it is proposed to use the test ^2 (17 1 whatever the embedding rate 



R might be. Once again, the asymptotic distribution ( 18 1 is used to solve the 
decision problem 

The alternative hypothesis Hr, that Z contains a stego-medium with em- 
bedding rate R €]0; 1], can be considered as a combination of stego and cover 
pixels. Hence, the use of the law of total expectation and the law of total vari- 
ance is relevant to calculate the two first moments of the log-LR /^(Z). Using 
the moments given in ( 20 ) — (231, for the case R = 2, a short calculation gives: 
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4 



(a 



1 - 



R 



Mo, 



A) 



+ 1 



(*o + f4) 



R 



7^2 



1 R 

1- 2 jMo 



(26) 



(27) 



In other words, by using the test 82 (17 1 for any R G]0; 1] only the detec- 
tion power is impacted. I nde ed, the null hypothesis does not change, hence, the 
asymptotic distribution (18 1 of the LR ^(Z) under T-Lq as well as the decision 



threshold r Qo (19 1 remain the same. This point is highlighted in the following 
theorem. 



Theorem 3. For any a a g]0;1[, assuming that the parameters {9 n }^ l= j_and 



{cr„}„ =1 are known, the power function /3s R associated with the test 82 {ID is 
asymptotically given for any R g]0; 1] by: 



(28) 



The power functions /3 Sr for 7V = 1000, #=0.1, a n = 0.5 and 6»„ = {127.5; 128} are 
drawn in Figure [2a] Once again, this figure highlights the potentially significant 
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Fig. 2: Illustration of LRT statistical performance, false-alarm probabilities and 
detection power, for N = 1000 pixels, R = 0.1, a n = 0.5 and 9 = {127.5; 128}. 
The empirical results were obtained with 5.10 4 realizations. 



impact of pixel expectation on the performance of the test 6% . 
It should be highlighted that the most powerful property of the test 82 is difficult 
to prove for R s]0; 1[, see [§]■ However, Figure [3] emphasizes the relevance of the 
proposed approach, which consists in designing a test for R — 2 and extending 
its application to R e]0;l[. Here, the power function of the proposed test is 
compared with the power function of the clairvoyant detector, that knows R. 
The numerical comparison present in Figure [3] shows that the loss of the power 
is negligible. 

Finally, it can be noted that the detection power as given in Theorem 3 com- 
plies with the square root law of steganographic capacity [5D]. Indeed, from ( [28| , 
a short algebra immediately permits us to establish that: 

lim Ps R = 1 and lim (3s R — ctQ. (29) 



5 Practical implementation of proposed LR test. 



In a practice, the application of the test 62 (17 1 is compromised because neither 
the expectation 9 n nor the variance cr^ of pixels are known: their estimated 
values, denoted 9 n and er^, respectively, have to be used instead. 

However, accurate estimation of the parameters 9 n and a n is a difficult prob- 
lem but necessary to obtain a high detection performance. This problem also 
occurs in LSB replacement steganalysis. An efficient yet simple way to overcome 
this problem was introduced in the well-known Weighted Stego-image steganal- 
ysis (WS), initially proposed in [T3]. The authors propose to locally estimate 
the parameter 9 n by filtering the inspected image so that 9 n correspond to the 
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Fig. 3: Numerical comparison between Proposed LR test 82 (17 1, and the clair- 
voyant detector which knows the embedding rate R = 0.1 ans, thus, uses the 
LR test design for this rate. Results were obtained from a Monte-Carlo simula- 
tion with 5.10 4 realizations using Lena image cropped to 128 x 128 pixels and 
addition of a Gaussian white noise with a = 2. 



mean of the four surrounding pixels. Similarly, the local variance of the four 
surrounding pixels is used to estimate cr^. The WS method has been studied 
thoroughly in [2T] and two major improvements have been proposed. First, the 
authors have empirically enhanced the estimation of pixel expectations by test- 
ing different local filters. Second, the author proposed to use moderated weights 
w n = + a , a > instead of the variance estimation 3^. 

In the present paper, it is proposed to use the WS filtering method to es- 
timate the parameters 9 n and cr^. Note that the proposed practical test is not 
optimal but intends to show the relevance of the proposed approach and feasi- 
bility to design a practical efficient test. Following the WS method, the practical 
implementation of the LR test 82 proposed in this paper estimates each 9 n by 
filtering the inspected image with the kernel: 




Contrary to what is suggested in [3T], for the case of LSB replacement, our nu- 
merical experiments indicate that the detection performance tends to get worse 
when using the moderated weights instead of the estimated variance. Our inter- 



pretation of this phenomenon is as follows. The proposed LR test ( 17 1 essentially 
relies on the increase of pixels' variance due to insertion of hidden information. 
Hence, the use of moderated weights tends to fundamentally bias the test and de- 
flates the performance results. Figure [4a| offers an example of this phenomenon 
through a comparison of ROC curves obtained using 10 000 images from the 
BOSSbase database with R = 1/2 and a = {1/4; 1/2; 3/4; 1}. 
On the other hand, the direct use of the estimated variance ct^ may lead to nu- 
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(a) ROC obtained with four different 
weight factor: a = {1/4; 1/2; 3/4; 1}. 
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Fig. 4: Impact of weights and calibration on proposed test performance. ROC 
curves obtained using the images from BOSS database [3] with R = 0.5. 



merical instability particularly in flat image areas. Hence, it was chosen to add 
a = 1/4 to the estimated variance in our numerical experiments. 



By using these estimated values in expression ( 15 1 the estimated log-LR 
Ai(zn), see Equation (15 1 becomes: 



A 2 {z n ) = log 



exp 



{a + a n ) 2 



exp 



(30) 



It should be highlighted that some difficult problems still remain open. 



First, the normalization of the log-LR, suggested in Equation (17 1, requires the 
calculation of the expectation /io and the variance ctq of the log-LR. Unfortu- 
nately, the estimates of the parameters a n are, in practice, not accurate enough 
to perform this normalization efficiently. 

Second, possibly the most difficult problem is that the statistical inference be- 
tween the cover image and the hidden information should be taken into account. 
For instance it was proposed in [26 to remove the LSB plane in order to remove 
any potential stego-noise. For LSB matching this is not possible. Therefore, the 
impact of hidden information on estimators 9 n and a n should be studied. Since 
the proposed test relics mainly on the slight increase of pixels' variance due 
to data hiding, the embedding changes may have an important effect on the 
estimates a n and on the proposed test. 

As explained above, proper normalization of the proposed test is critical in 
practice. Even though the proposed LR is very sensitive to hidden information, if 
its expectation can not be set to a fixed value under Ho, the between-image-error 
described in [5] may negatively impact the test accuracy. Numerical simulations 
show that the expectation of the LR A2(z n ) can be roughly approximated by 




(a) Digital image used for the Monte- 
Carlo simulations 
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(b) Power of the test 82 ( 31 1 as a function of 
pixel number for different false-alarm prob- 
abilities: theory and simulation. 



Fig. 5: Numerical verification of theoretical results through Monte-Carlo simula- 
tion based on natural image shown in Figure 5a 



Therefore, the practical test proposed in the present paper is given as: 



if A*(Z)<T ao , 
if A* 2 (Z)>T ao , 

1 N - 1 

with as<z) = £ m*») - io g (2) - 



(31) 
(32) 

One can note that, contrary to the LR statistically studied throughout Sec- 



tions |4.1f|4.2| the proposed decision statistic is not normalized. Indeed the vari- 

This is because the 



31 



ance of Aiiz^) is not taken into account in Equation 
estimation of pixels' variance is particularly difficult and the method used in 
this paper is not accurate enough. In fact, normalization can even lower the 



detection performance. The most notable thing about the test (31 1 is that the 



expectation of the decision statistics A%(Z) is always under hypothesis Hq- 
Figure HE] shows an example of the detection power obtained with the two tests 



based on the statistics (30 1 and (31 1 



6 Numerical Simulations. 

6.1 Theoretical results on simulated data. 

Figure [5] p resents a numerical verification of Theorem [3] The image shown in 
Figure 5a has been analyzed 5.10 4 times. Each run was preceded by the addition 
of a zero- mean Gaussian noise whose standard deviation was a = 1. The embed- 
ded hidden information was drawn from a binomial distribution 6(1, 1/2) with 
an embedding rate R = 1. The empirical power of the test 62 is compared with 




(a) ROC curves for R = 0.5. (b) ROC curves for R = I. 

Fig. 6: Numerical comparisons of detectors performance using BOSS database [3]. 



the theoretical result given by Theorem [3j for three different false-alarm prob- 
abilities: ao = {1CP 1 ; 10~ 2 ; 10~ 3 }. Observe that the obtained detection power 
almost perfectly corresponds to the theoretical results. 

Note that it is crucial to use the same image for this Monte-Carlo simulation 
because the detection power of the proposed test depends on image parameters, 
namely on 8 n and particularly on cr 2 . Hence, for a different image, the detection 
power may differ significantly as explained in Section [4] Moreover, the use of 
the same image artificially permits us to overcome the difficult problem of nor- 
malizing the log-LR and, thus, the effects of the between-image-error described 

in m. 



6.2 Comparison with the state of the art on real images. 



Matlab source code of proposed test, as detailed in Equation (31 1, is available 
on the Internet at : http://remi.cogranne.pagesperso-orange.fr/. 

One of the main motivations for this paper was to show that the hypothesis 
testing theory can be applied in practice to design an efficient LSB matching 
detector. This fact can only be shown by a numerical comparison with state-of- 
the-art detectors on large image databases. The potential competitors for LSB 
matching detection are not as numerous as for LSB replacement. As briefly 
described in the introduction, the operational context selected in this paper 
eliminates all prior-art detectors based on machine learning. Almost every other- 
detector found in the literature is based on the image histogram. For the present 
comparison, two histogram-based detectors, namely ALE [25 and the adjacency 
HCF COM [T5] detector, were used due to their high detection performance. 
Figure [6] shows the results obtained with 10 000 images from BOSSbase contest 
database j3j. Each hidden bit was drawn from a binomial distribution B(l, 1/2). 



The embedding rate was R = 0.5 in Figure [6a] and R — 1 in Figure |6b| Both 
figures show that the proposed test achieves a better detection power for any 
prescribed false-alarm probability. 

Similarly, Figure [7] shows the results obtained with the 1488 raw images 
from the 'Dresden Image Database' |16j . Prior to our experiments, each image 
was converted to an unprocessed TIFF format (using dcraw) and only the red 
color channel was used. The embedding rate was R — 0.25 in Figure [7a] and 
R = 0.5 in Figure [7b) The results presented in Figures |7a| and [7b] confirm that 
the proposed test has a better detection power for any prescribed false-alarm 
probability. Moreover by changing the embedding rate, the combined results of 
Figures [6] and [JJ show that the proposed test also performs better than prior art 
for any R. 

Note that, surprisingly, the detection power of the proposed test is slightly 
higher for the BOSSbase database than for the Dresden database for R = 0.5, 
see Figure [6a] and |7b[ respectively, whereas the Dresden database images are 
bigger. This phenomenon can be explained by the fact that the Dresden database 
images are RAW images that have not being further processed. In contrast, 
BOSSbase images have been downsampled, which may introduce correlations 
between neighboring pixels that implicitly make the filtering estimator more 
efficient. 

7 Conclusion and future works. 

The first step to fill the gap between hypothesis testing theory and steganalysis 
was recently proposed in |12I7I26| . This paper extends this first step to the case 
of LSB matching. By casting the problem of LSB matching steganalysis in the 




(a) ROC curves for R = 0.25. (b) ROC curves for R = 0.5. 



Fig. 7: Comparisons of detectors performance using Dresden database |16| . 



framework of hypothesis testing theory, the most powerful likelihood ratio test 
is designed. Then, a thorough statistical study permits analytical calculations 
of its performance in terms of the false-alarm probability and detection power. 
To apply this test in practice, unknown image parameters have to be estimated. 
Based on a simple estimation of these unknown parameters, a practical test is 
proposed. 

The relevance of the proposed approach is emphasized through numerical ex- 
periments. Compared to two leading histogram-based detectors, the proposed 
practical test achieves a better detection power. 

However, the practical test presented in this paper relies on a simple yet 
efficient filtered version of inspected media to estimate pixel expectations and 
variances. In our future work, a more efficient model should be used to increase 
the detection power. Lastly, a thorough statistical study of the impact of this 
estimation on detection performance is desirable to complete the present work. 
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